Model Selection

Dynamic Resolution Processing

# Dynamic Resolution Processing

Internvl3 38B Instruct GGUF

InternVL3-38B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional overall performance, with strong multimodal perception and reasoning capabilities.

BiQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, focusing on efficient visual document retrieval.

Safetensors English

Qwen2.5 VL Instruct 3B Geo

Qwen2.5-VL is the latest vision-language model in the Qwen family, focusing on enhanced visual understanding and agent capabilities.

Transformers English

Colqwen2.5 3b Multilingual V1.0 Merged

A multilingual visual retrieval model based on Qwen2.5-VL-3B-Instruct and ColBERT strategy, supporting dynamic input image resolution and generating ColBERT-style multi-vector text and image representations.

Transformers Supports Multiple Languages

Qwen2.5 VL 72B Instruct AWQ Fix

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and agent capabilities, supporting multi-format visual localization and structured output generation.

Transformers English

Colqwen2.5 7b Multilingual V1.0

A multilingual visual retrieval model based on Qwen2.5-VL-7B-Instruct using the ColBERT strategy, ranked first in the Vidore benchmark

Text-to-Image Supports Multiple Languages

Colqwen2.5 3b Multilingual V1.0

A multilingual visual retriever based on Qwen2.5-VL-3B-Instruct with ColBERT strategy, excelling in Vidore benchmark tests

Text-to-Image Supports Multiple Languages

Qwen2.5 VL 72B Instruct Pointer AWQ

Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and structured output generation.

Transformers English

UGround is a powerful GUI visual positioning model trained using a simple method, jointly developed by OSUNLP and Orby AI.

Multimodal Fusion

Transformers English

ColQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, designed for efficient indexing of document visual features.

Safetensors English

A visual retrieval model based on Qwen2-VL-2B-Instruct and ColBERT strategy, capable of efficiently indexing documents through visual features

Text-to-Image English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase